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Abstract 

Electronic language corpora, and their attendant computer software, are proving increasingly influential in language 
teaching as sources of language descriptions and pedagogical materials. However, few teachers are clear about their 
nature or their relevance to language teaching. This paper defines corpora and their types, discusses their contribution to 
language learning and teaching, and provides examples of their use in class. It also outlines the changes in knowledge, 
skills and attitudes that are needed for learners and teachers to take advantage of the opportunities offered by the 
availability of corpus resources. Finally, the paper discusses the limitations of using corpora in language teaching, and the 
potential pitfalls arising from their uncritical use. Although the paper refers to research and teaching materials and 
procedures relevant to English language teaching (ELT) it addresses issues related to language teaching in general. 


Introduction 

Corpora first came to the attention of most English language teachers in 1987 with the publication of Collins COBUILD English Language 
Dictionary, the first corpus-based dictionary for learners. The following year saw the publication of an influential paper on the use of 
corpus-derived and corpus-based materials in the language classroom (Johns, 1988), although these had been proposed earlier (e.g., 
Higgins & Johns, 1984; Johns, 1986; Leech, 1986; McKay, 1980; Sinclair, 1986). 

Since then, corpus-based language studies and pedagogical materials have grown exponentially; there is already a substantial and ever¬ 
growing body of corpus-based research on language structure and use, as well as on language learning and teaching (see Biber et al., 1998; 
Hunston, 2002; Kennedy, 1998; McEnery & Wilson, 2001; McEnery et al., 2005, in press; Meyer, 2002; Partington, 1996; Stubbs, 1996, 
2001; Tognini-Bonelli, 2001). [1] ‘Corpus’ has now become one of the new language teaching catchphrases, and both teachers and learners 
alike are increasingly becoming consumers of corpus-based educational products, such as dictionaries and grammars. However, few 
teachers are clear about the nature of corpora, or their significance for language teaching, and fewer still have ever made direct use of a 
corpus. The questions most frequently asked by teachers are: What is a corpus? How are corpora relevant to language teaching? How 









can they be used? The first aim of this paper is to answer those questions, provide an outline of the current state of affairs, and give 
examples of corpus types and uses. [-1-] 

The utility of corpora for language teaching has been questioned from different perspectives. The sceptics have expressed reservations 
about the ability of corpora to capture language use (e.g., Widdowson, 1991), or the usefulness of native-speaker (Li) corpora in providing a 
model for teaching (e.g., Prodromou, 1997), some going so far as to argue that Li corpora can intimidate learners (Gabbrielli, 1998), or 
disempower teachers (Dellar, 2003). Conversely, the fact that corpus-based studies relevant to language learning concentrate on those 
issues into which the use of corpora can offer insights may be misinterpreted as implying that corpora are the be all and end all of language 
teaching. [2] The second aim of this paper, therefore, is to demystify corpora and define their place within language teaching as a whole. 

Corpus-based research and teaching have been carried out predominantly at universities; therefore, teachers in other educational settings 
may think that corpora are not relevant to their teaching situation, or that the knowledge, skills and technology required to integrate 
corpora into their teaching are beyond them. However, there have been articles on how teachers with minimal computer resources can 
make use of corpora (c.f. Johns, 1991a, 1991b; Stevens, 1995; Tribble, 1997a, 2000). The third aim of this paper, then, is to demonstrate 
that using corpora is not an either/or option, but that teachers in different contexts can make use of them to different degrees to suit their 
learners and facilities. 

Corpora: Nature and types 

What is a corpus? 

Loosely defined, a corpus is “any body of text” (McEnery & Wilson, 2001, p. 197), that is, any collection of recorded instances of spoken or 
written language. For example, a pile of written assignments (e.g., essays) waiting to be marked is, roughly speaking, a corpus. Let us 
assume that these assignments have been written by students about to start a language course, and that the teacher has not taught the 
students before. The teacher can read the essays to form a general impression of the strengths and needs of the new class, but he/she may 
also want to focus on specific areas of interest. For example, while reading the assignments, the teacher may realise that the learners 
frequently make collocation errors. In order to examine the problem more closely, the teacher can go through the assignments, locate and 
list the unacceptable collocations, and determine whether there are any recurring patterns, that is, whether learners need help with the 
collocations of particular words, perhaps words normally associated with the topic of the assignment. 

In the case of a single class of twenty learners, this analysis might be somewhat time consuming, but it would still be manageable. If, 
however, there were one hundred assignments, the task would become impractical. However, if the learners had submitted their 
assignments in electronic form, and if the relevant software were available, the teacher could examine the use of specific collocations in a 
hundred or more scripts in the same time it takes to manually examine twenty. Better still, the teacher could observe more complex and 
detailed patterns, and with greater accuracy. Moreover, this electronic corpus would be a helpful resource for the teacher, as it would be 
available in the future for the examination of other language aspects. The corpus could also grow by the addition of new assignments, in 
which case the teacher could trace the learners’ development in given areas. This is why ‘corpus’ is currently understood as “a body of 
machine-readable text” (McEnery & Wilson, 2001, p. 197). [-2-] 

Imagine that at the end of the course our hypothetical teacher decides to summarise his/her findings on the learners’ use of collocations 
and present them at a conference or in an article. How helpful would the findings be to teachers in other contexts? In other words, how 
valid would it be to generalise from these findings? Such a presentation or article would be useful, but obviously any conclusions should be 
treated with caution, because the findings would only reflect the specific group of learners, taught by the specific teacher, in the specific 
geographical and social context. Also, the findings would reflect the use of collocation in the learners’ writing rather than their speech, and 
their use in specific text types. 

If the corpus contained texts by learners from all over a particular region, then it would be possible to draw more reliable conclusions. Still, 
the corpus compilers would need to include texts written by learners of the same level, and ensure that the texts were of the same type and 
on the same topics. In other words, the corpus would need to be representative of the type of learners and texts that they wanted to 
examine (see Biber, 1993). Also, as it would not be feasible or practical to collect texts by all the learners of the same level in the region, 
the corpus compilers would have to select a sample of texts from each class. 

The same principles apply to native-speaker corpora. ‘A corpus of English’ raises the question, ‘Which variety of English?’ Even if we 
restricted ourselves to one variety (e.g., American or British English), it would be impossible to create a corpus of the whole language, not 




least because language evolves continuously. We can only collect a sample, and strive to make this sample as representative as possible. 

This leads us to the stricter and much more helpful definition of a corpus as “a finite collection of machine-readable texts, sampled to be 
maximally representative of a language or variety” (McEnery & Wilson, 2001, p. 197). 

Types of corpora 

Corpora come in many shapes and sizes, because they are built to serve different purposes. [3.] There are two philosophies behind their 
design, leading to the distinction between reference and monitor corpora. Reference corpora have a fixed size; that is, they are not 
expandable (e.g., the British National Corpus), whereas monitor corpora are expandable; that is, texts are continuously being added (e.g., 
the Bank of English). Another design-related distinction is whether a corpus contains whole texts, or merely samples of a specified length. 
The latter option allows a greater variety of texts to be included in a corpus of a given size. 

In terms of content, corpora can be either general, that is, attempt to reflect a specific language or variety in all its contexts of use (e.g., the 
American National Corpus), or specialised, that is, aim to focus on specific contexts and users (e.g., Michigan Corpus of Academic Spoken 
English), and they can contain written or spoken language. Corpora can also represent the different varieties of a single language. For 
example, the International Corpus of English (ICE) contains one-million-word corpora representative of different varieties of English 
(British, Indian, Singaporean, etc.). As implied in the previous section, corpora may contain language produced by native or non-native 
speakers (usually learners). Finally, corpora can be monolingual (i.e., contain samples of only one language), or multilingual. Multilingual 
corpora are of two types: they can contain the same text-types in different languages, or they can contain the same texts translated into 
different languages, in which case they are also known as parallel corpora (Hunston, 2002; Kennedy, 1998; McEnery & Wilson, 2001; 
Meyer, 2002). [-3-] 

Creating a useful corpus 

First, the texts a corpus is to contain are selected and stored in electronic format. Written texts, if they are not already in electronic form 
(e.g., downloaded from the Internet, submitted by learners on a disc or CD-ROM, or sent by e-mail), must be scanned; spoken texts must 
be recorded and transcribed. [4.] The result of this stage is a raw corpus. Although a raw corpus can yield some information about language 
use, its usefulness is limited. For example, although the frequency of the word drive in the raw corpus can be determined, we will not know 
how many times it occurs as a noun and how many as a verb. Of course, different instances could be counted manually, but this would defy 
the purpose of compiling a corpus. 

The utility and flexibility of a corpus can be increased by adding coding that a computer can recognise. Labels (or tags ) are attached to the 
words, phrases, sentences, paragraphs, sections, or to entire texts in the corpus. Information related to non-linguistic properties of the 
texts is referred to as mark-up. Mark-up may give information about the source of the text (e.g., book, newspaper), the date of publication 
or broadcast, the author or participants, or text sections (e.g., introduction, conclusion). Information related to the linguistic properties of 
the texts in the corpus is called annotation. Most Li corpora are annotated for the part of speech and form of the words (e.g., 
singular/plural, present/past tense). This type of annotation is also called grammatical annotation, or tagging. For example, the word 
teaching would be tagged ‘teaching_WF if it was a present participle (as in ‘she was teaching’), and ‘teaching_NNi' if it was used as a 
noun (as in ‘language teaching’). Corpora can also be annotated for lexical sense (e.g., lexis denoting belief, expectation) and pragmatic 
function (e.g., request, invitation). [5] What kind of mark-up or annotation is added to a corpus is determined by the information to be 
extracted. Sample 1 shows the three questions asked in the second paragraph of this article, annotated for part of speech. [6] 


What_DDQ is_VBZ a_ATl corpus_NNl ?_? 

How_RRQ are_VBR corpora_NN2 relevant_JJ to_II language_JNNl 
teaching_NNl ?_? 

How_RRQ can_VM they_PPHS2 be_VBI used_WN ?_? 


Sample 1. Example of annotation for parts of speech 

How are corpora relevant to language teaching? 

Corpus use contributes to language teaching in a number of ways (Aston, 2000; Leech, 1997; Nesselhauf, 2004). The insights derived from 






native-speaker corpora contribute to a more accurate language description, which then feeds into the compilation of pedagogical grammars 
and dictionaries (Hunston & Francis, 1998, 1999; Kennedy, 1992; Meyer, 1991; Owen, 1993). The analysis of learner language provides 
insights into learner needs in different contexts, which then inform learner dictionaries and grammars. Research on learner corpora also 
contributes to our understanding of language learning processes (Granger et al., 2002). Corpora of language teaching coursebooks enable 
the examination of the language to which learners are exposed, and, when compared to Li corpora, facilitate the development of more 
effective pedagogical materials. Learner corpora have the potential to contribute to the construction and evaluation of language tests in a 
multitude of ways (see Alderson, 1996); however, this potential has remained underexploited (but see Ball, 2001; Barker 2004). Finally, 
both native-speaker and learner corpora can themselves be used as learning/teaching materials (Aston, 1997; Aston et ah, 2004; Johns, 
1991a; Kettemann, 1995). Figure 1 summarises the interconnecting ways in which corpora are relevant to language teaching (adapted from 
McEnery & Gabrielatos, 2005, forthcoming). [-4-] 
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Figure 1. Corpora and ELT 


We will now turn to the contribution of corpora to language teaching in more detail. 

Language description 

The use of Li corpora in linguistic research has provided the most convincing evidence of discrepancies between actual use and traditional, 
introspection-based views on language (Sinclair, 1997, pp. 32-34), and has revealed patterns that had not been detected by introspection. 
This is pertinent to language teaching, as the information about language structure and use that learners receive, whether through 
pedagogical materials or teachers, is still largely based on introspection. 

Helpful as it may be, introspection is not always reliable. Being a native speaker does not automatically mean that a user has a conscious, 
clear, and comprehensive picture of the language in all its contexts of use, nor do all native speakers share the exact same intuitions. A 
good example is the claim by a native-speaker teacher that in English, “question tags, along with bowler hats, mostly belong to 1960s BBC 
broadcasts” (Bradford, 2002, p. 13). This view is contradicted by the findings of Biber et al. (1999, p. 211), based on the examination of the 
40-million-word Longman Spoken and Written English Corpus, who report that “about every fourth question in conversation is a question 
tag.” 

It is, of course, very helpful to examine the intuitions of native speakers and elicit the different alternatives they find acceptable, or can 
generate by manipulating their language. It is equally helpful, however, to examine which of these alternatives native speakers actually use, 
and in what contexts and frequency. The discrepancy between intuitions and attested use indicates that when the language information 
learners are given is based only on intuitions, and when the examples and texts used in class are chosen to reflect these intuitions, then 
teachers and materials writers may unwittingly present their personal informal observations about language as the true and full picture of 
language structure and use, or present their own preferred usage as the only ‘correct’ or ‘acceptable’ one. The importance of corpus- 
informed pedagogical materials becomes more evident if we take into account that “to a great extent, the course-book can be considered to 
be the learners’ ‘corpus’” (Gabrielatos, 1994a, p. 14). [-5-] 

Corpus-based research has also revealed the inadequacy of many of the rules that still dominate ELT materials. For example, in a study of 
a random sample of 710 (/-conditionals [ 7] from the written section of the BNC, the conditional sentences were examined against the 























































information about form, time orientation and attitude to likelihood given within the currently favoured framework of five types (zero, first, 
second, third and mixed). The rules presented in fifteen recent intermediate-to-advanced coursebooks, taken collectively, accounted for 
only 44% of the sentences (Gabrielatos, 2003b). [8] 

This section has highlighted the first important contribution of corpus-based research to language teaching, namely more accurate 
descriptions of English, which in turn can inform reference books and pedagogical materials (Hahn, 2000; Mindt, 1997). The language 
insights derived from corpora go beyond questions of correct or natural use, and provide additional details about the frequency of 
particular language features in specific contexts. 

Examining learner language 

Strange as it may sound, every single teacher has used a learner corpus, in the loose definition, if only in an informal and intuitive way. 
Teachers routinely write end-of-course reports, or answer questions about a learner’s strengths and needs. How are they able to do so? To 
use corpus terminology, each learner’s performance during the course is used to compile what we may call a mental corpus, which is 
consulted when evaluating a learner. The same applies when assigning an impression mark to a piece of writing or a task performance. 
Using language corpora allows teachers to be much more precise in examining learner language and identifying needs than just forming an 
overall impression, because corpus use enables teachers to examine particular areas in detail, or annotate for specific learner errors 
(Granger, 19993 - 

In general, studies on learner language focus on the over/under use of specific features in different contexts in comparison to native- 
speaker use, and the analysis and categorisation of learner errors. Error analysis may deal with frequent or common errors, or error 
patterns, according to the learners’ Li, level and age, the medium of production (speech or writing), or the context of use (e.g., homework, 
test), while taking into account factors such as task and text type. Studies using learner corpora have focused on diverse aspects of learner 
language, mainly in writing. Examples of areas that have been examined with the help of language corpora are the use of lexical chunks (De 
Cock et al., 1998), collocations (Nesselhauf, 2005), complement clauses (Biber & Reppen, 1998), the progressive and questions (Virtanen, 
1997, 1998), overstatement (Lorenz, 1998), connectors (Altenberg & Tapper, 1998), speech-like elements in writing (Granger & Rayson, 
1998), and epistemic modality (McEnery & Rifle, 2002). 

One area of language teaching which has interested corpus researchers is English for Specific/Special Purposes (ESP), especially English for 
Academic Purposes (EAP). [2] The areas that have most attracted corpus-based research are those of scientific and academic writing, often 
with a view to the implications for teaching (Coxhead, 2002; Flowerdew, 2002). In scientific/academic writing, the term ‘learner’ can be 
interpreted in two ways: a learner of the language system as a whole, or a learner of the style and conventions of academic writing. It is 
interesting that the latter applies to non-native speakers (NNS) and native speakers (NS) alike, in that both groups are, in several respects, 
approached as “trainee academics,” the writing of which is “compared to that of established writers as evidenced in the discourse of 
published papers” (Gabrielatos & McEnery, 2005, in press, p. 312). The blurring of the NS-NNS distinction, as far as academic writing is 
concerned, is better understood if we consider that NNS who have published academic/scientific papers must be considered as “established 
writers” (Gabrielatos & McEnery, 2005, in press, p. 312; see also Lucas et al., 2003). Studies on academic and scientific writing have 
focused on language features, such as directives (Hyland, 2002), modality (Hyland & Milton, 1997; Thompson, 2002), or collocations 
(Gledhill, 2000; Luzon Marco, 2000), as well as the conventions of academic writing, such as citation practices (Harwood, 2004; Hyland, 
1999; Thompson & Tribble, 2001). Finally, corpora can be used to detect plagiarism in student essays (Atwell et al., 2003; Lyon et al., 

2004; van Halteren, 2003). [10] [-6-] 

The contribution of such studies is two-fold. By examining learner language, we can define areas that need special attention in specific 
contexts and at different levels of competence, and so devise syllabi and materials. The analysis of learner language can also provide 
insights into the process of language learning (Bekiou & Diaz, 2004; Tono, 2000). 

Corpora, language exposure, intuitions and generalisations 

A corpus in the mind? 

Intuition, or ‘a feel for the language,’ is what learners aim to develop. Native speakers develop that ‘feel’ partly through exposure to 
language in use and the recognition of patterns. Through this exposure, native speakers build the mental equivalent of a corpus (Bod, 

1998). Intuitions can be seen as the results of the informal analysis of this mental corpus. It follows then, that by working on 
representative examples from language corpora, learners will be helped to recognise recurring patterns of structure and meaning. As Stern 
states, language learners need to be helped “to see a particular feature ... not merely as an isolated item but as part of an evolving system of 





interrelationships which should become increasingly differentiated as it grows” (1992, p. 145). The wealth of instances of use of a specific 
item that corpora provide can offer the amount of evidence required for learners to refine their perception of it. 

Pattern recognition, generalisations and rules 

This section will first use a visual example to illustrate how pattern recognition works, and then discuss the implications for language 
teaching and the use of corpora, with particular regard to the formulation of pedagogical rules. We will assume that the images used in this 
example represent a specific language feature, such as the use of a grammatical structure, or the collocational behaviour of a word. We will 
also assume that we wish to establish the behaviour of the feature by examining a small number of language examples. On the strength of 
the analysis of this sample, we recognise a regular pattern (Figure 2). 
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Figure 2. 

In traditional language teaching fashion, we could formulate a rule. However, it might be that when more examples are added to the 
sample, some irregularities emerge (Figure 3a). 



In the light of the new evidence, we could formulate a list of exceptions to our rule (Figure 3b). 



Figure 3 b. 

[-7-1 

Let us assume that, over time, we come to observe more instances of the language item in question, or, in corpus terms, that we examine a 
larger sample, and that our observations reveal even more irregularities to the initial pattern, or, in language teaching terms, more 
exceptions to the rule (Figure 4). 

































Figure 4. 

On the face of the evidence at this point, two alternatives exist: First, we can conclude that the particular language feature is “illogical,” and 
that even if a rule could be formulated, it would inevitably have a disproportionate number of exceptions. Second, we could become 
suspicious of the fact that the exceptions cover more instances than the rule, and tentatively conclude that the fault lies with the rule, not 
the language. We could then hypothesise that what we have observed is only a part of a different pattern from that was initially perceived-a 
pattern that may be larger and more complex. If we adopt the second alternative, the next logical step is to further increase the size of the 
sample (Figure 5). 



The larger sample seems to reveal a new pattern. However, in the light of previous experience, this time we are not so quick to draw 
conclusions or formulate rules. Since the larger the sample, the more valid the conclusions, we considerably increase the sample size to test 
our new hypothesis (Figure 6). [-8-] 





























Figure 6. 


Observing the pattern repeat itself (Figure 6), we are now in a much better position to formulate dependable generalisations about the 
language item in question. However, caution is needed regarding how these generalisations are delimited and phrased (c.f. Close, 1992, pp. 
2-11; Leech, 1994; Swan, 1994; Westney, 1994). [11] The delimitation of generalisations relates to a number of important parameters that 
must be considered: [-9-] 

1. The medium; that is, whether the sample contains only speech or only writing, or both. 

2. The context of use, that is, “the physical, social and psychological background in which language is used” (Gabrielatos, 1999, p. 15). 
The main contextual elements are the topic, the writer’s or speaker’s purpose, the type of text or interaction, the audience or 
participants and their relationship. 

3. The co-text, that is, the surrounding text or linguistic neighbourhood of the feature, as words and structures seem to both attract, and 
interact with, one another. 

4. The representativeness of the sample; in other words, the collection of texts needs to represent a microcosm of the language use of 
the population under investigation. 

5. The size of the sample; as the example demonstrated, language patterns may be too large and complex for a small sample to reveal 
adequately. 

It would be rash to make broad statements about the behaviour of a language feature without reference to these parameters. As far as 
language teaching is concerned, exceptions and special cases are usually the result of overgeneralisations that do no take into account the 
parameters outlined above, or rules formulated on the basis of inadequate or selective evidence. 

Corpora and condensed language exposure 

Language learners in countries where the target language is not widely spoken often lack opportunities for the rich language exposure that 
is essential for developing the ability to recognise patterns. Extensive reading (Nation, 1997; Susser & Robb, 1990) is believed to facilitate 
language learning, because it exposes learners to real language use in context, and in amounts far larger than the short texts and dialogues 
usually preferred for the presentation of new language items. Extensive reading is also regarded as an effective way to help language 
learners develop intuitions as native speakers do (Krashen, 2004). The pattern-recognition example in the previous section gives an 
indication of how focused language exposure can be used actively, in order to formulate intuitions about language use. 

Representative corpora can offer condensed exposure to language patterns. It is not argued here that corpora should be the sole vehicle for 
the development of reading skills and strategies, [12] nor is it argued that corpus use can replace out-of-class reading. Rather, what is being 
suggested is an approach that shares characteristics of both intensive and extensive reading-what might be called condensed reading. The 
reading of corpus samples is intensive in the sense that learners focus on the behaviour of specific language features; it is extensive in the 














sense that learners examine language features in a larger number of texts than in conventional text-based techniques. Condensed reading 
enables learners to engage with language use in context in order to formulate and check, though not necessarily consciously, hypotheses 
about language structure and use. 

One printed page contains 500 words on average. [13] The British National Corpus contains 90 million written words, or the equivalent of 
approximately 180,000 pages. A six-year language teaching programme of five one-hour lessons per week amounts to a total of about 
1,000 lessons. To gain exposure through reading to the amount of language evidence contained in a 90 million word corpus, a learner 
would need to examine about 180 pages per lesson (in the case of classroom or intensive reading), or read about 80 pages every day of the 
year for six years (in the case of out-of-class or extensive reading), the equivalent of two to three books per week. 

Through corpora, learners will experience types of texts that they may not choose to read out of class, or that teachers and materials writers 
may not deem appropriate. It seems clear, then, that learners may benefit from using corpora in addition to pedagogical materials and 
authentic texts. [14] The considerations listed here also highlight the limitations of pedagogies that avoid the use of materials and a pre¬ 
planned focus on language, such as the ELT translation of Dogme (Thornbury, 2000). These approaches tend to favour class discussions 
loosely structured around topics, with the teacher and learners acting as the main, or even sole, sources of language exposure. In doing so, 
they offer limited exposure to language, which is usually further restricted to the teacher’s language variety and preferred usage. [-10-] 


Corpora in the classroom 

Before examining ways in which corpora can be used as (sources of) classroom materials, we need to clarify that a data-driven, awareness¬ 
raising approach is not necessarily linked to the use of corpora. Teachers can use texts containing the target language features and, through 
awareness-raising tasks, guide learners to discover the behaviour of lexical, grammatical or discourse elements. Therefore, it would be 
helpful to distinguish between text-based and corpus-based approaches to data-driven learning. [15] 

Corpora can be used in language teaching in two ways (Leech, 1997, p. 10): The soft version, requires only the teacher to have access to, 
and the skills to use, a corpus and the relevant software. The teacher prints out examples from the corpus and devises the tasks. Learners 
work with these corpus-derived and corpus-based materials (Bernardini, 2004; Granger & Tribble, 1998; Osbourne, 2000; Tribble, 1997b; 
Tribble & Jones, 1990). Usually corpus examples are in the form of a concordance, where the word or structure being examined in the task 
is in the middle, so that patterns are more easily discernible (see Sample 2). The hard version, requires learners to have direct access to 
computer and corpus facilities and have the skills to use them (Aston, 1996). Tasks can be devised by the teacher (Tognini-Bonelli, 2001), 
contained within a CALL programme (Hughes, 1997; Milton, 1998), or chosen by the learners, with or without the teacher’s guidance 
(Bernardini, 2002). 

Taking into consideration the aims of a lesson, the design or selection of materials and the management of learning, in relation to teachers 
and learners, we can define combinations that cover the spectrum from totally teacher-centred to totally learner-centred. At the teacher- 
centred end, the teacher decides on the aims of the lesson, selects/designs the materials and manages the lesson. At the learner-centred 
end, the learner decides on all three, with the teacher or computer programme acting as facilitator and guide. Of course, there can be 
intermediate combinations, particularly when decisions are taken collaboratively between teacher and learners. 

Soft version: Four examples 

Example 1. Comparing text-based and corpus-based approaches to teaching collocations 

This example shows how a text-based data-driven approach could be used to teach collocations of the noun diet to a group of intermediate- 
level learners. Because class time is limited, a long text or a small number of short texts could be used. Also, it would be wise to focus on a 
specific collocation pattern-only collocations of the noun diet in the singular with verbs, phrasal verbs, or expressions containing verbs, for 
example. 

When selecting suitable texts, it becomes clear that it is difficult to find authentic texts which are ‘about diet,’ as they have not been written 
for language teaching purposes. The three texts chosen for this example [16] gave advice on dieting or reported on dieting experience. 
Although the texts are long for a typical 60/90-minute lesson (they total 2,250 words), they contained only 12 instances of the noun diet, 
and only 5 collocations with verbs, 2 of which were with the same phrasal verb (Sample 2). [17] 
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NO! The Eat and Burn 

diet 

identifies over too foods that tur 

11 

The Eat & Burn 

diet 

is easy to follow. You don’t feel 

12 

concept behind The Eat and Burn 

diet 

is this: eat foods that safely fore 


Sample 2. Concordance of the noun diet in the three texts 

[-11-] 

It does not seem worthwhile to spend the time it takes learners to read more than 2,000 words to teach only four collocations, particularly 
if it is uncertain whether these are among the most frequent ones. 

This example illustrates a problem with text-based approaches: authentic texts do not conveniently contain enough instances of the 
patterns or structures on which teachers may want to focus. Additionally, a given text cannot be expected to necessarily contain the most 
frequent patterns, or to contain them in proportions that reflect their overall frequency in language use. Until recently, the only solution to 
this problem was to write texts specifically for pedagogical use, so that a sufficient number of instances of the target language features 
could be included. However, such texts tend to contain the target language features in unnatural proportions. It could be argued that if the 
frequency of the target features in pedagogical texts reflected actual use, that is, if the content of the texts was informed by corpus data, 
then these texts would be good teaching tools. Unfortunately, this is not the case. As the example above indicates, it is unlikely that every 
single text will reflect the overall frequency of a word, pattern or structure. Consequently, these putative corpus-informed pedagogical texts 
would be too densely, and so, unnaturally, packed with particular features. The process of incorporating an unnatural number of specific 
language items into the texts affects other elements of discourse. The result is a text that is as inauthentic as the traditional pedagogical 
texts and dialogues. 

The same collocation pattern could be approached using a concordance from the BNC. One advantage of using a corpus is that the 
frequency of patterns can be expected to reflect real language use. Another benefit is that more detailed patterns can be investigated. For 
instance, learners may be presented with two sets of examples to examine: one with the pattern ‘verb + preposition + any word + diet‘, and 
one with ‘verb + article + diet‘ (Samples 3 and 4 respectively). [18] 


try other foods, although I advise against a 

diet 

of all dried food. 

might have had recently could be affected by your 

diet 

, or alcohol and cigarette consumption. 

or whatever-of British women are on a 

diet 

at any one time, but that, as a nation 

was breast-feeding, her doctor asked about her 

diet 

and found that she was a vegetarian. She 

margarines, and are best avoided on this 

diet 

programme. 

Ve been up when I’ve been on a 

diet 

, I mean smelling, smoked out, smoked out on 




the end. When he had been on this 

diet 

for ten days he was tested with various foods. 

her child demands breastfeeding despite being on a 

diet 

of solids. Concerns about dehydration if the child 

feels it will be able to cook with a 

diet 

sweetener." 

commune in Vancouver, Canada and fed on a 

diet 

of black pudding and Ecstasy. If I were to 

legumes, meat hardly ever figuring in their 

diet 

. On as little as 8/6d (42 new pence 

testing. He gradually forgets about the 

diet 

. This pitfall can be avoided by ensuring that 

with the new knowledge I had gained about my 

diet 

I was eating sensibly, I no longer crave sweet foods 

last saw you. You should go on a 

diet 

. Exercise more. Edouard rides every morning 

the purpose of its use (to go on a 

diet 

, or to exclude certain elements such as meat)? 

"I can go on a 

diet 

when I grow up," I said, but I was 

scale is the seasonal dieter who goes on a 

diet 

in spring to get rid of the Christmas over-indulgence; 

After going on the 

diet 

ask yourself these questions again-you may be 

our current eating habits and including in our 

diet 

the necessary changes that are required to maintain a 

You can help by getting involved in her 

diet 

, preparing healthy, balanced meals and emphasising that 

unlikely that you are going to keep to the 

diet 

for very long. However, here is the vital 

More people are killed by poor 

diet 

than by smoking, alcohol, drugs, accidents and 

. The opposition can no longer live on a 

diet 

of anti-Thatcherism. They face a prime minister, 

daren’t eat chewing gum if I’m on a 

diet 

Oh you don’t need to 

be suspect and was temporarily omitted from the 

diet 

. The patient returned to eating only foods that 

for salt becomes less as you progress through the 

diet 

programme. 

. Arthritic working-class guys raised on a 

diet 

of fish and chips and fags; they died of 

If you’re on a 

diet 

and you’ve found that you’ve hit problems, ring 

She’s erm, she’s on a 

diet 

. Oh really? She’s lost 

That’s why she’s on a 

diet 

! Cos she doesn’t 

enjoyed it. It didn’t seem like a 

diet 

-in fact, if there was one sentence that 

enough how important it is to set aside from your 

diet 

foods you suspect or know cause you problems. 

The more you are able to stick to a 

diet 

of natural foods-fruit and vegetables (raw if 

will have heard that if you do stick _WI to a 

diet 

and lose weight, then your metabolism will drop so 

was losing her will power to stick _WI to her 

diet 

. Anne had already trimmed down to a reasonable 

metabolic rate. Providing you stick _WB to your 

diet 

, and don’t consume lost of extra calories, 

be so easy that you can stick with the 

diet 

until all the weight is off. 

Stage II must immediately be struck out of your 

diet 

. This is very important. Failure to 



to the calorie-counting method, supervised by a 

diet 

club, dietitian, or doctor. 

, it dwelt in woods, surviving on a 

diet 

of maize, fruit and grass. The north American 

experiment in which a doctor switched to a 

diet 

including the average adult consumption of the country’s 

it’s fat, and you should think about a 

diet 

. But don’t be bullied by precise, 

like your neighbour’s before he went on that 

diet 



Sample 3. verb + preposition + any word + diet ( concordance view ) 


- 12 -] 

1. Therefore, adapt the diet according to your lifestyle, to your personal diet. 

2. Altering the diet is also far more risky for a child than it is for an adult, so there are more difficult decisions to be made before embarking on an 
elimination diet. 

3. Wholegrain cereals are also good for the same reasons, and protein foods(meat, fish, eggs and cheese) taken in moderation help to balance the diet 
and give all the necessary nutrients. 

4. Even before you begin the diet , notice how fast you eat, and slow down. 

5. You need to consider those antecedent events that prompt you to break a diet , and then think about which of these things you can avoid or change in 
some way. 

6. The final events that lead our dieter to break the diet are quite concrete, namely, walking into a cafe, seeing and smelling the pastries, and seeing 
other people happily enjoying them. 

7. I could manage to lose five or six pounds and then I would break the diet , go back to normal food and put all the weight back on again. 

8. Unfortunately the punishment for breaking a diet is also in-built; you put weight back on. 

9. What if I get a reaction to a particular food after I have completed the diet ? 

10. Treatment was not merely a matter of prescribing herbal medicines, but a whole regimen which controlled the diet and the life-style. 

11. Maybe it is due to my always having eaten a diet rich in red meat and saturated animal fats? 

12. In addition, oily fish is a rich source of the Omega-3 fatty acids and recent medical research suggests that there are a number of health benefits to be 
gained from eating a diet rich in these fatty acids. 

13. During the final two-week period you will be eating a diet composed of the foods you have selected through trial and error in the preceding four 
weeks. 

14. Once you have established a diet on which the child remains well, be careful not to allow too much of any one food. 

15. But, as RICHARD BATH discovers, England’s appointment of coach DICK BEST for less than a year means that, instead of a bright new era, we can 
expect a diet of pragmatism and playing the percentages. 

16. For example, it can be argued the expansion into amalgamated police units has enlarged the organization to a point where it is no longer accessible to 
the man in the street; alternatively, it may be that the use of a centralized computer and complex technical aids has alienated the public even at the 
same time they are increasingly fed a diet of violent news snippets which reinforce a fear of crime and generate another "folk devil" of criminal 
menace, which demands the impossible: a policeman on every corner. 

17. In contrast, those abroad, notably in the West, who were fed the diet of stage-managed events, found his assassination both momentous and 
incomprehensible. 

18. Feed a diet of insects, worms, plant matter, flake food and freeze dried food. 

19. What we are going to do is find a diet that not only helps you to achieve effective weight loss, but is really healthy, suits your individual needs, and 
can be followed for years to come in order to maintain the weight and shape you want. 

20. Therefore when you finish a diet , returning to eating an average amount, the amount you used to eat to maintain a constant weight, will result in you 
putting on fat. 

21. But I doubt very much whether there are any claims now outstanding which are not statute-barred, in respect of children stillborn before 22 July 1976 
or any children born before that date, who are locked in litigation with their mothers over whether the mother tasted alcohol or followed a diet other 
than that recommended by the current phase of medical opinion during pregnancy. 

22. Nevertheless, I now had 120 people, 116 women and 4 men, who had followed the diet for a full eight-week period. 

23. I could fill a book with the other similar comments which were written on the questionnaires but I think we can take it as read that the trials proved 
beyond doubt that if you followed the diet moderately strictly you could definitely lose inches from parts usually untouched by normal dieting 
methods. 

24. You will essentially be following the diet in Stage I, but adding to it any food or drink you now have listed in column 3 of the Grand Review Chart on 
page 228. 

25. So I thought, "I’ll invent a diet where you feel good and you can eat." 

26. You may prefer to keep the diet simple during your working week and to save the more elaborate meal for the weekends of, if you feel really 
adventurous, for dinner parties. 

27. Unless you absolutely hate cooking, or are just too busy, it is preferable that you experiment with some of the recipes in order to keep the diet 
interesting. 

28. Unless you absolutely hat cooking it is advisable to experiment with some of the recipes in order to keep the diet interesting. 

29. The answer, therefore, is to maintain a diet which is as balanced and healthy as possible. 



30. Ethnic minorities will have the right to obtain the diet required by their religious beliefs. 

31. I remain nervously aboard, to hear doctors exchanging advice on every deck: Doctor McRae recommends a diet of rice and yoghurt. 

32. For example, the origin of ivory can be identified by its strontium isotopic composition, which reflects the diet of the elephant. 

33. In no circumstances should you do this without help and advice from your doctor-restricting the diet of small children can he very dangerous. 

34. Some years ago Maisie had swallowed a whole bottle of vitamin pills and, although Henry had suggested that in his view Maisie’s stomach could 
probably have stood a diet of broken glass, aspirin and raw steak, Elinor had insisted on ringing Charing Cross Hospital. 

35. Just having a little treat here or there can add up to enough to stop the diet from working. 

36. The choice of diet-Two dietary factors- supplementing the diet with wheat bran and reducing the intake of refined carbohydrates-reduce 
cholesterol saturation of bile in subjects with supersaturated bile. 

37. Robin-Anne nodded, but was too busy eating to take much notice of her brother, though she did manage to mumble that she thought the diet soda 
was really kind of good. 

38. A software engineer who freely admits to being plump and has tried every diet in the book claims to have invented a programme that’s guaranteed to 
keep those extra pounds off. 

39. Some of Dr Gerson’s patients-including those with TB-tried the diet . 

40. My exercise class students witnessed the low fat diet’s remarkable effect on my body (I had lost only 61 bs [2.7kg] but all from my problem areas) and 
then they tried the diet with similar benefits. 

41. Gerald was asked to try a diet containing no sugar or white flour and was given an anti-fungal drug. Nystatin. 

42. In his forties it grew worse and he decided to see a specialist When Alan mentioned that he had taken a lot of antibiotics just before the urticaria began, 
the specialist suggested that he try a diet with no sugar and very little starch. 

43. You are welcome to vary the diet , but do make sure you eat other foods besides chocolate this week! 


Sample 4. VERB + ARTICLE + diet (sentence view) 


[- 13 -] 


Another important benefit of corpus use becomes apparent if we compare the number of different collocational patterns contained in the 
texts and corpus samples [iq] in the example (Table 2). 



No. of words 

No. of patterns 

Texts 

2,250 

5 

Corpus samples 

2,000 

59 


Table 2. Comparison between texts and concordances 

Although the texts and corpus samples have roughly the same number of words, the corpus samples contain twelve times the number of 
patterns. As mentioned above, pedagogical texts tend to contain an unnatural density of the target language features. The use of corpus 
samples achieves the same density, but without compromising natural use. The richness of the corpus samples makes it possible to devise 
tasks that cover a wide range of features. For example learners can be given the following task: 

• Find verbs, phrasal verbs, or expressions containing verbs that combine with the noun diet. 

• Does diet have the same meaning in every sentence? 

• Which combinations seem to be more frequent with each meaning? 

• Find combinations with similar/opposite meaning. 

• Try to group the patterns in some way. 

This task focuses on collocation patterns, lexical meaning and frequency of occurrence. It also involves some form of practice, or “mental 
contextualisation” (McCarthy, 1990, p. 36), as learners are asked to group the patterns in a meaningful way. 


Example 2. Lexical inference 





Corpus samples also lend themselves to work on reading skills, and, in particular, to developing strategies for inferring the meaning of 
unknown lexis in the text. Although it is, of course, possible to use one or more texts to train learners in this enabling skill, corpus samples 
are superior in a number of ways. A text will contain only a few instances of the lexical item, will usually demonstrate its meaning and use 
in one context, and may not provide sufficient clues for inferring meaning. Corpus samples, on the other hand, contain a large number of 
examples which demonstrate meaning and use in diverse contexts and offer a wealth of clues. Consider the following sample task (based on 
Sample 5 below): 

In the following examples the same word is missing in each case. 

• Work out the meaning of the missing word. 

• Decide if the word has exactly the same meaning in all the sentences. 

• If there are different meanings, check whether, and how, they are related. 

• If there are different meanings, check whether there are specific words or expressions that are associated with each meaning. [-14-] 

1. The winners of Black & Decker 9032 cordless ???????? action drills which were the prizes in a competition which appeared in the May issue of DIY 
are as follows: 

2. It was a "maniacal" beating around the head with a claw ???????? . 

3. Endill watched Tock make a hole in the wall, holding his ???????? with both hands to stop it banging in the wrong place. 

4. He had a ???????? and banged it against the walls to restore order but nobody took any notice of him. 

5. It was taken out of the context of the early punks and placed alongside the ???????? and sickle, the IRA and PLO slogans and any other symbols 
which could be guaranteed to raise the hackles and the eyebrows of the BOF’s (remember them?). 

6. The problem could just be confined to this guitar and removing the strings and lightly tapping the frets down with a block of wood and a small ?????? 

?? could well fix it. 

7. Author Ian Fleming’s original unpublished notes on his most famous creation are to go under the ???????? at London auctioneers Sotheby’s on 
December 15. 

8. There are several ways in which you can do this-I use a professional glazier’s staple gun which is both quick and efficient, but if you find this too 
expensive an investment when you first begin pressed flower work you can use a ???????? and nails instead. 

9. These mechanisms enable a stressed metal to be rapidly filled with dislocations (something like 10 per square centimetre) and thus to flow under a 
steady load or the blow of a ???????? quite easily. 

10. Next Monday, Rod’s prize goes under the ???????? through ADT, the world’s biggest car-auction firm at Blackbushe, Hants. 

11. Family’s ???????? revenge on love cheat soccer ace 

12. Chief union negotiator John Allen said it was another " ???????? blow" for the industry. 

13. But I must have felt the need for some support, because I found I’d grabbed hold of one of my ???????? s-a geologist is always armed with a ?????? 
??-and when I got through to the back of the house he was there already, at the kitchen window." 

14. Its plastic jacket bore a gold ???????? and sickle. 

15. Jaq rifled through the pack to find the card he used to signify himself; the black-robed High Priest, enthroned, gesturing with a ???????? . 

16. The appellant, having discovered that the man had a number of previous convictions for similar offences, equipped himself with a ???????? and a 
quantity of weak sulphuric acid and sought out the man at his place of work on two occasions. 

17. Using a small ???????? the needles were then tapped through these holes and then cut off flush with the exterior of the pipe. 

18. "Distractedly, he began to change ???????? . pouch, pipe and matches from hand to hand, dropping them and picking them up, before finally deciding 
to put the ???????? down and stuff the rest into his pockets. 

19. She had laid the ???????? there, after she had tried to break the tower window. 

20. The fist techniques of taekwondo involve lunge punches, reverse punches, back fists and ???????? fists-all of them similar to the basic karate 
punches described in the previous chapter. 


Sample 5. Lexical inference 

[-15-] 

Example 3. Revision and critical examination of grammar rules 

Corpus samples can be used for revision, and offer an opportunity for learners to formulate a second opinion on traditional ELT rules (see 
Leech, 1994). The following task focuses on t/-conditionals and could be used with upper-intermediate and advanced learners. Only a 
small, random corpus sample is given here, which is too small to be representative. However, even in such a small sample, the limitations 
of the five-types framework become clear. 

• Check if all if-sentences are conditional sentences. 
































• Do all conditional sentences express a cause-result relationship? 

• Allocate each conditional sentence to one of the types in your course/grammar book. Do all sentences conform to the information/ 
rules in your book? 

• In your groups, discuss any sentences that don’t fit the rules and report to the class. 

• Try to adapt the rules so that they fit more sentences, or decide on a different way to group the sentences. 

1. “My dear, dear fellow, if I had a lira for every time I’ve heard that story ... well ... “ 

2. If meat was banned, for instance, this was because the animal too has a soul (and may even be a dear departed relative!). 

3. If Gunnell herself has cashed in, she’s not been so blatant or obviously motivated by the financial side as other athletes. 

4. If ordinary children build their linguistic abilities on antecedent social and cognitive abilities, these may, in fact, be 
necessary prerequisites for the emergence of language. 

5. Payment of fines; imprisonment; amputation of right hand (the left hand is only amputated if the right has already been 
amputated) 

6. The cold did little to hinder the Ores, for Ores and Goblins are hardy creatures, and, if needs must, will eat any flesh no 
matter how foul or what manner of creature it comes from. 

7. If the lack of energy is not remedied, the excess stress on the body can ultimately lead to prolonged illness and possible 
death. 

8. If you pull it off, I get fifteen hundred. 

9. We shall examine random additions to a file; as the principles involved do not change if the additions are grouped or 
regular in pattern, the methods used can be adapted to suit those cases. 

10. Into the power-vacuum created by the slaying of Osric and Eanfrith stepped Eanfrith’s brother, Oswald, who slew 
Cadwallon in the battle of “Heavenfield” near Hexham in the autumn of either 634 (if Eadwine was killed in 633) or 635 (if 
Eadwine did not perish until 634), and assumed the kingship of both the Deirans and the Bernicians. 

11. This enforced poverty made them easier targets for propaganda: if they left with no more than their allowance, they 
could be portrayed as shabby Untermenschen scuttling away like rats; if they managed to outwit the system, then they were 
economic criminals fleeing with stolen goods. 

12. A. If you are a tenant of a public landlord, such as a local authority, new town or housing association, and if you have a 
pressing need to move to another local authority area for a job or social reasons (for example, because you are elderly or 
handicapped), you should ask your landlord whether you can be nominated for a move under the National Mobility Scheme. 

13. “The facts speak for themselves; if Dana had any feelings for you she’d have refused my offer. 

14. If a factory chimney dumps smoke on a thousand gardens nearby it may be very expensive to collect 1 from each 
household to bribe the factory to cut back to the socially efficient amount. 

15. If the sale is by sample as well as by description it is not sufficient that the bulk of the goods corresponds with the 
sample if the goods do not also correspond with the description. 

16. They are people whom we rarely consider in this House, but when there is a suicide or accident on the railway, the 
driver, and his mate if appropriate, may be mentally scarred for life by the experience. 

17. The Member of Parliament shall be eligible for nomination for selection as the prospective parliamentary candidate and, 
whether nominated or not, he or she shall be entitled to appear as if they had been nominated before the special meeting of 
the General Committee convened in accordance with section (3) of this clause and to be considered for selection as the 
prospective parliamentary candidate. 

18. Example 4:10 Tenant’s power to make time of the essence (1) if the landlord fails to take any step in the procedure for 
rent review within a period of time prescribed by this lease (whether or not that step could also have been taken by the 
tenant) the tenant may give the landlord written notice. 


19- It is perhaps as well to remember at the outset that the main injury in this particular case was a hip injury which, if it 
had occurred to a younger man, would have produced an arthrodesis operation. 

20. “When I saw Ivo with a parcel he was about to mail to his wife’s cousin in Karlovy Vary, I told him I was driving that 
way, and that I’d drop it into the shop where Edita’s cousin works if he wished.” 


Sample 6. If -sentences (random sample) 


[-16-] 


Example 4. Homework tasks with a multiple focus 

The variety of information in corpus samples can provide material for homework assignments. Learners can do the tasks outside of class so 
that classroom time can be devoted to feedback discussion and perhaps some fine-tuning by the teacher. For example, learners can be 
given corpus sentences with the words sorrow and grief (samples 7 and 8) [20] with a task that focuses on nuances of meaning, sense 
relations (synonymy and antonymy) and collocation patterns: 

Examine the sentences with sorrow and grief 

• What causes sorrow or grief? 

• What other words/expressions with a similar or opposite meaning can you find? 

• What verbs, adjectives and nouns seem to combine more often with the two words? 

• Are there any frequent fixed expressions? 

• Devise a dictionary entry for each word. 

• Look up the words in dictionaries and compare your entry with the information they give. 


1. Fly over the solitary rock washed by the glacial tears of sorrow, let there be at your passing, a radiant beam over the gloomy solitary rock. 

2. Among his sacred possessions were an enormous club which could raise the slain to life again; a magic harp whose music made its listeners forget 
sorrow; an inexhaustible cauldron from which no-one is turned away hungry; and two marvellous sheep — one eternally roasting, the other forever 
feeding in readiness for slaughter. 

3 . God promises his people comfort and invites us not to live in sorrow. 

4. Dick expressed his great sorrow at the news of LF363 and said that he had a very soft spot for it, having "cut his teeth" on that Hurricane during his 
days with BBMF. 

5. Another couple passed by giving them a wide and sympathetic berth, leaving them alone in their sorrow. 

6. GE NUP DIMU The sorrow that overtakes the child in the womb when it knows it will be born dead 

7. As I walked down by the riverside one evening in the spring Heard a long gone song from days gone by Blown in on the great North wind Though 
there is no lonesome corncrake’s cry of sorrow and delight You can hear the cars and the shouts from bars and the laughter, and the fights May the 
ghosts that howled round the house at night never keep you from your sleep May they all sleep tight down in Hell tonight or wherever they may be ... 

8. I do not need your understanding, or your damned sorrow! 

9. "Through the night of doubt and sorrow, Onward goes the pilgrim band, Singing songs of expectation, Marching to the Promised Land." 

10. The letters were handwritten in strong block capitals, and he peered at the sign for obvious traces of sorrow, such as shakiness of the hand or tear 
stains, but there were none. 


Sample 7. Sorrow (random sample! 


1. They hold their feelings in, their grief , anger, frustration and would rarely weep in front of others. 

2. If, of course, her grief becomes prolonged and it seems that she is making no headway towards adjustment, it will be advisable for her to see her 
doctor, but this rarely happens. 

3. There was movement that year among all the teams and Pace’s death in March not only upset Brabham plans but also caused Ecclestone, who was very 
attached to Carlos, both as a man and as a driver, considerable grief. 

4. Short-sighted Mansell, 26, came to grief on the hard shoulder of the M6 near Pontefract, West Yorks. 

5. The second type had no power-had numbers, but no safety: numbers conferred only grief and weakness. 

6. Mom gave birth to a baby boy, and called him Winchell (not after the doughnuts, though the association would later cause him grief at school). 

7. We pass, heads to the side, in deep grief for a piece of shredded lorry tyre. 





8. He also added that Moore had never asked them to forgive her for throwing their lives into grief and chaos. 

9. I was distracted with grief this time, torn by guilt, and Eric had to look after me while I acted my part to perfection, though I say it myself. 

10. In her despair and grief Mrs McDermott also turned to alcohol for relief. 


Sample 8. Grief (random sample) 


[-17-] 


Soft version: Teacher manipulation of corpus examples 

When using the soft version, teachers can manipulate the corpus examples in a number of ways. They can restrict the examples to a 
specific medium (writing/speech), and genre or text type (newspaper article, novel). They can also decide on the amount of text to give 
learners-only a few words on either side of the key word (as in Samples 2-3 above), an entire sentence (as in Samples 4-8 above), or a 
paragraph. Finally, they can edit the samples to remove sentences that they deem too difficult for the learners (Wible et al., 2002). This 
manipulation should be carried out with the understanding that the adapted samples are not good guides to the frequency of a language 
item. 

We will take the pattern ‘verb + article + diet‘ as an example. There are 3,458 instances of diet as a noun in the BNC, and 177 instances of 
the pattern. Since there would probably not be enough classroom time for learners to examine so many examples, the sample must be 
reduced (Sample 4 above contains 43 examples). If, for the sake of convenience, the first 43 examples from the original 177 were selected, 
the sample would not be representative, since the collocates are in alphabetical order. Instead, the collocations were extracted from a 
random sample of 1,000 sentences out of the total of 3,458. In this way, the 43 examples in Sample 4 give a much more accurate picture of 
the pattern. (However, the sample is too small to give a truly representative picture.) Also, in order to save time and keep a clear focus, 
instances of diet with the meaning ‘assembly’ were removed. 

There are, of course, cases in which it is difficult to restrict the number of examples without affecting the representativeness of the sample, 
for example, when selecting examples for students of below-intermediate level. In this case, the teacher has three options: simplify the 
examples, select suitable sentences in a way that their make-up approximates the original sample, or avoid dealing with issues of 
frequency. Nevertheless, the edited sample may still be expected to contain at least some of the most frequent collocations. 

Hard version; Learners using corpora 

When learners have direct access to corpora, the focus of the lesson can be made more flexible to reflect their interests and needs. In other 
words, the teacher or learners have the option of modifying the aims and direction of the lesson on the spot according to what emerges. In 
the case of the collocations of diet, learners could also choose to examine other patterns, for example collocations of the noun diet with 
adjectives, patterns of dieting used as a noun, or diet as a verb. If the concordance or sentences do not offer enough clues, learners can get 
more text just by clicking either on the key word or a special button (depending on the software). 


Corpora and ELT methodology 

Although the use of corpora in language teaching has been linked to a “data-driven” approach (Johns, 1991a), it would be a mistake to 
assume that corpus use is restricted to any single teaching methodology. The use of corpora, in both the soft and hard versions, and either 
in a classroom context or for self-study, is compatible with all methodologies that accept explicit focus on language structure and use; in 
other words, teaching frameworks that reserve a role for noticing or awareness/consciousness-raising (e.g. Lightbown, 1985; Schmidt, 
1990; Sharwood Smith, 1981). [-18-] 

Corpus examples can enhance frameworks involving explicit presentation of language features, but they are particularly relevant to 
frameworks which depend on the learners using their existing language knowledge to work out the meaning and use of new elements 
(Rutherford & Sharwood-Smith, 1988), as has been shown by a number of studies utilising corpora as sources of language data (Aston, 
1997; Granger & Tribble, 1998; Johns, 1997). Although it may not be readily apparent, corpus use is also compatible with methodologies 
that advocate exposure to language, or comprehensible input (Krashen, 1985), rather than explicit focus on language, as was demonstrated 
above through the example of condensed reading. 





In other words, corpus use fits equally well within language-based approaches, with the Presentation-Practice-Production (PPP) framework 
as their best known realisation (Read, 1985; Spratt, 1985), and task-based approaches (Fotos & Ellis, 1991; Loschky & Bley-Vroman, 1993; 
Nunan, 1989; Skehan, 1998). In the case of a straightforward PPP lesson, corpus data can be used instead of made-up examples in the 
Presentation stage. But a corpus can only be utilised to the fullest if the PPP teaching framework has been modified and expanded to 
incorporate awareness-raising (Gabrielatos, 1994b) or data-driven procedures (Johns, 1997). Johns proposes a flexible sequence of 
Research, Practice and Improvisation, as he sees the learner “as ‘linguistic researcher’, testing and revising hypotheses, or as ‘language 
detective’, learning to recognize and interpret clues from context” (1997, p. 101). In formatted task-based frameworks, corpus data can be 
used in the “Pre-emptive Work” stage (Skehan, 1993), or the “Pre-task” and “Post-task” phases (Willis, 1996), which involve input or 
consciousness-raising. 

Corpus use can also enhance learner independence. According to Johns (1997, p. 101), when using corpora or corpus-based materials, 
“students define their own tasks as they start noticing features of the data for themselves-at times features that had not previously been 
noticed by the teacher” (see also Bernardini, 2002). Along the same lines, the use of corpora enhances the use of the language lab, and 
suggests a more flexible and learner-centred use for CALL materials (McEnery et al., 1997). This is not to say that the teacher’s role is 
diminished; rather, it is enriched and diversified. The teacher becomes less a provider of input and facts about language and more a 
facilitator and consultant, or, at the learner-centred end, a co-researcher. 

Finally, having learners work with samples from representative corpora of different varieties (e.g., British or American English) and 
different genres (e.g., academic English, chatroom English) will give them the rich exposure they need to become aware of the existence of 
varieties, not so much in order to learn these varieties, but to understand that English is not monolithic. 

Corpus use in learning and teaching: Prerequisites 

The availability of corpora and corpus software alone cannot ensure that language teaching will take full advantage of the opportunities 
they offer. Language teaching institutions will have to take certain courses of action; learners and teachers in their turn will have to adjust 
to changes in knowledge, skills and roles. [-19-] 

What is apparent is the necessity for investment in computers, access to corpora, and the relevant software. This would be a costly move if 
a school were to opt for the hard version, but the cost would be reduced considerably if the soft version were adopted. In the first case 
there should be enough computers for each learner in a group, or at least for every two to three learners. In the second case, a school will 
only need enough computers for the teaching staff. Investment in technology, however, is just the tip of the iceberg; it is the investment in 
the users of corpora, the learners and teachers, that poses the greatest challenge for language teaching (see Kennedy & Miceli, 2001). 

Learners need to become familiar with corpora (Leech, 1997, p. 10), and in the case of the hard version, they have to be trained to use 
corpus software (Bernardini, 2002). They also have to be introduced to data-driven approaches to learning, and guided to develop the skills 
that such approaches require. They have to be guided away from the “single correct answer” concept, and the notion of fixed rules and 
exceptions, towards the recognition of patterns and alternatives, and the importance of context. The utility of corpus use does not stop at 
helping learners discover language facts for themselves-when learners (are guided to) examine corpus samples they also develop a crucial 
element of learning skills (see Cohen, 2003; Oxford, 1994), namely the ability to recognise patterns of language structure and use. To 
employ a popular analogy, in consulting a dictionary or grammar learners are given fish; by actively engaging in pattern recognition they 
learn how to fish. 

Of course, teachers need to be informed about corpora and the relevant software, and become skilled users (Renouf, 1997). This is not 
expected to take place quickly, and may be met with reluctance, or even resistance, on the part of teachers (Arkin, 2003). Teachers also 
need to be in a position to assist and guide learners in their language investigations. This means that the teachers’ awareness and 
knowledge of language will have to extend beyond the information in pedagogical materials (see Gabrielatos, 2002a, 2002b; Leech, 1994). 
Teacher preparation programmes would not only have to add components related to corpora and their uses, but also to place much greater 
emphasis on language awareness and description (see Andrews, 1994; Sinclair, 1982). 

Maintaining a sense of perspective 

English language teaching is vulnerable to pendulum swings, and has a propensity for the marketing and uncritical acceptance of “miracle 
methods” (see Decoo, 2001; Gabrielatos, 2001, 2003a). As mentioned in the introduction, many language teachers have little awareness of 



issues pertaining to corpora and the analysis of naturally occurring data, and minimal, if any, familiarity with corpus software tools. [-20] 

Corpus evidence has challenged the over-reliance on intuitions that characterises much of language teaching. Shifting the focus on actual 
language use is clearly a positive development. However, it is conceivable that the language teaching pendulum may swing to the other 
extreme: an over-reliance on corpus data. Such corpus worship could lead teachers and learners to disregard the fact that, as large as 
corpora may be or may become in the future, they cannot capture the entirety of language use because, by definition, they are only samples 
(Gavioli, 1997, p. 85). It is also worth considering that corpus studies depend on labelling and counting language elements or learner 
errors, and that these labels themselves are informed by intuitions and linguistic theories (Sinclair, 2004). A more sensible attitude towards 
intuitions and corpora as sources of language insights would be neither that intuitions are useless, nor that corpora are the ultimate 
solution. As Sinclair (1991, p. 39) states, native-speaker introspections are useful “in evaluating evidence rather than creating it.” Therefore, 
intuitions should be balanced against, and enriched by, the evidence of language in use that corpora provide (see McEnery & Wilson, 2001, 
pp. 5-12). [21] 

Corpus use, particularly in the form of concordances, is very well suited to the teaching of lexis and, to a lesser extent, grammar. As corpora 
and relevant software become more available, and corpus use becomes more widespread, language teaching may well concentrate on lexical 
and grammatical patterns, at the expense of discourse and interaction skills, that is, the teaching of reading, listening, writing, and 
speaking skills and strategies. Similarly, since corpus-based or corpus-derived materials are good vehicles for raising awareness of language 
features, language production and interaction may be given less than adequate attention. Differently stated, when working with corpora, 
learners become observers of language use. This is necessary for language learning, but not sufficient; learners also need to become 
participants in language use. 

Corpus samples, and in particular random ones, may not be suitable to all teaching contexts. Consequently, teachers and 
materials/software developers may need to manipulate the samples given to learners, a process not without pitfalls, particularly when the 
focus is on the frequency of language features. For example, corpus samples may have to be adapted when used with low levels or young 
learners, and when corpus examples contravene or offend sociocultural norms and customs. 

Corpora are also excellent sources of information about the frequency of language features in different contexts. Although such information 
is indispensable for syllabus design, it could also lead to a new kind of prescription, what we might call frequency worship, that is, 
concentrating on frequent items, patterns and structures at the expense of less frequent or idiosyncratic uses. Such practice deprives 
learners of alternative choices. Of course, it is not argued that learners should not be given frequency information , or that they should not 
be guided to become aware of the fact that some elements are more frequent than others, but that they should also be helped to realise that 
‘less frequent’ does not mean ‘less acceptable’, and that ‘infrequent’ does not mean ‘wrong’. Similarly, learners should be made aware that 
frequencies change according to context of use. 

The following analogy may help put the issue of frequency into perspective. Frequent items can be seen as the background, which is largely 
taken for granted, while the infrequent or idiosyncratic features foreground the user’s personality. Frequent items, used appropriately, help 
users blend in with a discourse community, whereas less frequent ones characterise individual language users. In view of this, language 
learners do need to be familiar with frequent features, which will enhance their understanding and production, but they should not be 
deprived of exposure to less frequent features, which will enable them to interpret nuances, enrich their own use, and help them express 
themselves in the new language. [-21-] 

It is important to remember that concordance programs work with corpora, and that, consequently, the type and reliability of the derived 
information is contingent on the corpus that is used. Similarly, corpora are, ideally, representative samples of a language variety, a genre, or 
a medium (spoken or written). The misguided view of corpora as containing ‘the language’ may lead to generalisations from the 
examination of inappropriate corpora (e.g., generalising from a specialised corpus). Also, treating any large collection of texts as a corpus, 
that is, as a representative collection, may lead to conclusions based on the analysis of non-representative samples, for example, when 
using the Web as a corpus. This is not to deny that the Web is a vast, and freely available, resource of attested language use, but, rather, to 
stress that in order for the Web to be used effectively for teaching/learning purposes its users need to be aware of both its potential and 
limitations (see Kilgarriff & Grefenstette, 2003; Meyer et al., 2003; Robb, 2003; Volk, 2002). For example, the Web contains both NS and 
NNS English. 

Finally, many language teachers have only limited access to corpora and corpus tools, usually through free online concordancers provided 
for demonstration purposes. These free tools allow for a small sample of concordance lines (usually 40-50), which may or may not be 
sufficient for learners to get a clear picture of the language feature they are investigating. These samplers usually give a fixed number of 



words, typically 5-10, on either side of the key word/phrase, which may be inadequate for certain learning situations. Finally, these free 
tools do not always give information about the medium or genre of each concordance line. Also, because of limited data and restricted use 
of corpus software, teachers may see only easily observable patterns (e.g., adjacent collocations), and not less readily apparent ones (e.g., 
discontinuous collocations). Therefore, it would be wise to investigate any limitations of free corpus tools and take them into account. 

Conclusion 

Corpora and language description 

Corpus-based linguistic research has provided increasingly clear and accurate descriptions of native and learner language, and has 
furnished linguistics and language teaching with new insights into language structure and use. Corpora have made it possible to compare 
native intuitions with actual use, and move from prescription to description. Thanks to corpora, language description for language teaching 
has been moving from over-generalised and exception-ridden rules towards flexible and context-specific patterns. Finally, due to corpus- 
based language analysis we are now in a position to identify the frequency of particular language features both with reference to language 
use as a whole and, more importantly, with reference to specific contexts. In fact, the analysis of large corpora not only makes it possible to 
identify frequent patterns and uses, but also affords enough data to examine rare or idiosyncratic ones. [-22-] 

Corpora as language teaching tools 

The increasing availability of corpora and ease of access to them, particularly through the World Wide Web, places a wealth of actual rather 
than made-up examples from different contexts at the fingertips of both teachers and learners. Corpus-based teaching is well suited to 
raising awareness of the varieties of English. Corpora also offer a welcome alternative to both specially-constructed pedagogical texts and 
authentic texts- the former being densely packed with the target language features, the latter offering only a partial picture of a language 
element. Another important contribution of corpora is the enhancement of discovery approaches to learning, which regard learners as 
language researchers. The development of corpus tools has also increased the value of the language lab. 

Corpora, learners and teachers 

The use of corpora in language teaching has helped redefine learner and teacher roles. It has reinforced learner-centred methodologies, 
and facilitated a further step away from the conception of teachers as sources of knowledge and providers of input, towards one of teachers 
as guides and facilitators, or even co-researchers. Corpus use has also introduced the need for learners and teachers to acquire new skills, 
and has placed increased emphasis on the necessity for teachers to develop their awareness of the language they teach. Finally, corpus- 
based research and teaching has the potential to empower non-native teachers and researchers, since native speaker introspection is no 
longer considered the one infallible source of insights into language structure and use. 

Corpora and language teaching; What kind of relationship? 

There is still a lot of ground to be covered until corpus use becomes a staple of language teaching and learning. In fact, if we wanted to 
describe the present relationship of most language teachers with corpora, then perhaps ‘blind date’ would be the most fitting metaphor. 

[22] However, the relationship between corpora and language teaching is definitely not ‘a fling’, as corpus-based materials and teaching 
approaches are becoming ever more pervasive in language teaching. But then again, it would be misleading to call the relationship a 
marriage, and short-sighted to wish it to become one. Corpora can and will continue to contribute greatly to language teaching in a 
multitude of ways, [23] but it would be misguided to treat them as a panacea. Corpus use is not meant to replace existing teaching 
methodologies, but to enrich and enhance them. If the time-dishonoured ELT pendulum is to be prevented from performing another one 
of its swings, then the use of corpora should not be treated as an alternative to, or rival of, existing teaching approaches, but as a welcome 
addition. [-23-] 
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Notes 

[1] This paper is based on my plenary address at INGED 2003 International Conference, Multiculturalism in ELT Practices: Unity and 
Diversity, organised jointly by BETA (Romania), ETAI (Israel), INGED/ELEA (Turkey) and TESOL Greece, Baskent University, Ankara, 
Turkey, 10-12 October 2003. I would like to thank Paul Baker (Lancaster University) for directing me to relevant corpus studies. Special 
thanks are due to Aliki Chappie for editing the different incarnations of this paper. 

[1] For short overviews see McEnery & Gabrielatos (2005, forthcoming), Pravec (2002). 

[2] See also the debate in the Correspondence section of ELT Journal (Carter & McCarthy, 1996; Prodromou, 1996a, 1996b) 

[3.] For a comprehensive account of native-speaker and learner corpora, with relevant references and links to websites, see Xiao 
(forthcoming), http: / /www.lancs.ac.uk/postgrad/xiaoz /papers /corpus%20survey.htm . For a glossary of terms used in corpus linguistics see 
McEnery & Wilson (2001). 

[4] Of course, corpus compilers need to have already secured permission from the copyright holders. 

[3] For a discussion of mark-up and annotation see McEnery et al. (2005). 

[6] The annotation was carried out automatically by the Wmatrix interface (Rayson, 2001, 2003), 
http: / /www.comp.lanes.ac.uk/ucrel/wmatrix /. using the CLAWS part of speech tagger, 

http: / /www.comp .lanes. ac.uk/computing/research /ucrel /claws / . For a guide to the complete tagset used in the BNC see 
http://www.comp.lancs.ac.uk/ucrel/bnc2/bnc2guide.htm 

[j] The initial sample was 1,000 sentences, but was reduced to 831 after non-conditional uses of if and sentences with even if were 
removed. As the coursebooks examined presented only syntactically straightforward sentences, and in order to examine ELT materials on 
their own terms, cases of embedded or elliptical clauses and idiomatic uses were also excluded, leaving 710 //-conditionals. 

[8] See also Ferguson (2001), Fulcher (1991), Maule (1988), Wang (1991). 

[ a ] Although EAP is a sub-field of ESP, as both terms refer to language teaching with a better defined and narrower focus than “general 
English,” it has become common practice to treat ESP and EAP as somehow distinct, though closely related, areas (see Duddley-Evans & 

St. John, 1998; Hutchinson & Waters, 1987; Masters & Brinton, 1998; Robinson, 1991; Swales, 1985). 

[10] See also the report of the Learning Technologies Group, Oxford University Computing Services 
f http: II www.oucs. ox. ac.uk/ltg/reports /plag index.xmD . 








[11] On the use of terminology in language teaching see Borg (1999). [-34-] 

[12] For example, identifying the topic, the gist, or specific information, or using the context, co-text and background knowledge to infer 
meaning or attitude (see Nuttall, 1996; Wallace, 1992). 

[13.] Calculated on the basis of a standard A4 page, single spaced, using Times New Roman 12. 

[m] The term “authentic texts” is used here as a shorthand for “texts addressed to native speakers of a language.” For a discussion of 
“authenticity” in language teaching see Taylor (1994), Widdowson (1979, 1990). 

[is] Text-based refers to the use of a single text, or a small number of short texts, as language data. For examples of text-based language 
awareness studies see James & Garrett (1991). 

[16] The texts used were: “Diet Guidelines Aimed at Healthy People,” by Emily Gersema, 26 September 2003, 
http://www.stopgettingsick.com/templates/news template.cfm/7061 “Quick-fix diets fail fat Britons,” by Jo Revill, The Observer, 5 
January 2003, http://ohserver.gnardian.co.nk/nk news/story/o.fiQ02.868SQi.oo.html: “You CAN Lose the Weight You Want!” 
http://www.eatandburn.com/ 2 /?rid= 22 &code=ncdfr 2 o 6 o 7 &publisher=&transid= . 

[rz] The concordance was derived from the three texts using Wmatrix (see note 6). 

[18] Sample 4 is given in an alternative view to a concordance, called “sentence view.” Both sets of data are from the British National 
Corpus and were derived using the BNCweb interface, developed at the University of Zurich 

f http: / /homepage.mac.com/bncweb/home.html~) . 

[19] From this point on, “corpus sample” will be used to refer to collections of examples from a corpus in either “concordance” or 
“sentence” format. 

[20] For reasons of space, only a random sample of 10 sentences is given here; students should be given a larger corpus sample so that 
more patterns are discernible. 

[21] For a discussion of different views on the role of theory and intuitions in corpus-based research see McEnery & Gabrielatos (2005, 
forthcoming). 

[22] I would like to thank Alan Waters (Lancaster University) for prompting this metaphor by suggesting “first date.” 

[23] For example, the development and availability of multimodal spoken corpora, that is, corpora in which the transcribed text is linked 
to sound and video files (Nivre, et al., 1998), will enable researchers, materials writers, teachers and learners to use corpora to focus on 
phonological features, as well as facial expressions, gestures and body language. 
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Appendices 

Appendix 1. Free/affordable corpora and corpus tools 

• British National Corpus Sampler (l million words or written and 1 million words of spoken English): 
http://www.natcorp.ox.ac.nk/gettlng/sarnpler.htrnl . Also, free, but restricted, access to the full BNC: 
http: / / sara.natcorp.ox.ac.uk/lookup.html 

• Collins Wordbanks Online English corpus (concordance and collocation samplers): 
http://www.collins.co.nk/Corpns/CorpnsSearch.aspx 

• The Compleat Lexical Tutor: http: //i22.2o8.224.121 

• Michigan Corpus of Academic Spoken English (MICASE): http: / /www.hti.umich.edu /rn/micase 

• Variation in English Words and Phrases (Mark Davies, Brigham Young University). Interface to the full British National Corpus (too 
million words): http: //view.byu.edu/ 

• Web Concordancer (works with a variety of corpora): http: //www.edict.com.hk/concordance/ 

• WebCorp: The Web As Corpus (University of Liverpool): http: //www.webcorp.org.uk/ 

• WordNet: A Lexical Database for the English Language (Princeton University): http://www.cogsci.princeton.edu/~wn 

• WordSmith Tools: http: //wwwi.onp.co.uk/elt/catalogue/Multimedia/WordSmifhToolst.o 

Appendix 2. Online courses and information on language corpora 

• Concordancing in the classroom / Data-driven learning techniques: 
http://www.edict.com.hk/concordance/aboutweb.htm#Concordancing%2oin%20the%20classrroom 

• Corpus Linguistics: A Practical Web-based Course (Lancaster University): http://www.ling.lancs.ac.uk/courses/ahaw- 
nscl/clc top.htm 

• A companion to Corpus Linguistics (McEnery & Wilson, 2001): http: //howland-files.lancs.ac.uk/monkey/ihe/linguistics/contents.htm 

• Information and Communications Technology for Language Teachers, Module 3.4, Corpus Linguistics (written by Tony McEnery and 
Andrew Wilson): http: //www.ict4lt.org/en/en mod2-4.htm . 

• Information and Communications Technology for Language Teachers, Module 2.4, Using concordance programs in the modern 
foreign languages classroom (written by Marie-Noelle Lamy, Hans Jorgen Klarskov Mortensensium and Graham Davies): 
http://www.ict4lt.org/en/en mod2-4.htm# Toc48i2Q4i<u . 

• Catherine Ball, Georgetown University: http://www.georgetown.edu/facultv/ballc/corpora/tutorial.html 
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